Credit Card Customer Churn - EDA & Modelling

Introduction

Goals & Objective

Explore and visualize the dataset. Build a classification model to predict if the customer is going to churn or not, Optimize the model using appropriate techniques Generate a set of insights and recommendations that will help the bank

The first goal of this project is to provide an analysis which shows the difference between a non-churning and churning customer. This will provide us insight into which customers are eager to churn.

The top priority of this case is to identify if a customer will churn or won't. It's important that we don't predict churning as non-churning customers. That's why the model needs to be evaluated on the "Recall"- metric (goal > 62%).

Libraries

Libraries used can be found in the code block below

The Data

Data sample

The building block of any data science project is the data. Below you can find one data record which will be used in further analysis. The dataset consist of 10000 samples describing the customers and it's behavior.

The following columns/features can be split up in the following groups:

SECTION 1

Data Preprocessing EDA(Exploratry Data Analysis)

In this phase we'll quickly explore the data and remove/impute incorrect values. So that a cleaned data can be used for further analysis/modelling.

Check for duplicates and change ID to ClientNumber

Null values?

2 columns have null values are found.

There are no missing values

Descriptive Anaylsis

Observations:

Data Visualization (EDA)

Checking the demographic variables

Age compared to the churn

Univariate Analysis And Bivariate Analysis

Univariate Analysis

Gender vs churn

Number of dependents vs churn

Education level vs churn

Income category vs churn

Checking the product variables

Types of cards vs churn

Bivarirate Analysis

Relationship with bank vs churn

Number of products bought vs churn

Months inactive vs churn

Number of contacts vs churn

Credit limit vs churn

Total revolving balance vs churn

Openness To Buy Credit Line vs churn

Change in Transaction vs Churn

Total transaction amount vs churn

Total transaction count vs Churn

Change in transaction count vs Churn

Average Card Utilization Ratio

Correlation

Non Churn and Churn Profiles

Non Churning Customer Churning Customer
Demographic variables
Age 47 46
Gender F/M F/M
Dependents 2 2
Education Level Graduate Graduate
Marital Level Married/Single Married/Single
Income Category Less then \$40K Less then \$40K
Product variables
Type Of Card Blue Blue
Length Of Relationship 36 months 36 months
Products Bought 4 3
Inactive Months 2 3
Number Of Contact 2 3
Credit Limit \$8726 \$8136
Revolving Balance 1256 672
Open To Buy Credit Line 7470 7463
Transaction Amount Change 0.77 0.69
Total Transaction Amount 4650 3095
Total Transaction Count 69 45
Transaction Count Change 0.74 0.55
Card Utilization Ratio 0.3 0.16

SECTION 2

Feature Selection

Handling Missing Values

Customer Churn Prediction

Here we will train an optimized (treebased) model which will predict if a customer will or won't churn.

Data Preperation

Before we start training a model we must prepare our data. Different steps that we can undertake:

In this notebook we shall focus on the upsampling method. The data wrangling is performed to make sure that the upsampling is performed in a correct manner.

SMOTE (Synthetic Minority Oversampling Technique)

We saw that our dataset was imbalanced. This can give problem when creating a classification model since it might not learn the decision boundary. This ofcourse can be solved with upsampling.

One technique used for this is SMOTE, this technique creates new synthetic samples which can be used for training.

SMOTE first selects a minority class instance a at random and finds its k nearest minority class neighbors. The synthetic instance is then created by choosing one of the k nearest neighbors b at random and connecting a and b to form a line segment in the feature space. The synthetic instances are generated as a convex combination of the two chosen instances a and b.

To use SMOTE we'll need to encode our categorical features.

Note: It's important to only upsample the training data and so that no synthetic data is present in the validation dataset.

Categorical to numaric

Siplitting data

Section 3 Model building

Model Training over data

Logistic regression, Decision Tree Classifer, Random Forest, Grident Boosting, Adaboost classifier, Xgboost, bagging classifer

RandomForestClassifier

GradientBoostingClassifier

Decision Tree Classification Model

Logistic Regression

Bagging classifier

AdaBoostClassifier

XGBoost classifier

Section 4 Model building - Oversampled data

Model Training over sampling data

Logistic regression, Decision Tree Classifer, Random Forest, Grident Boosting, Adaboost classifier, Xgboost, bagging classifer

RandomForestClassifier

Section 5 Model building - Under sampled data

Undersapmling sampling data

Model Training Undersapmling sampling data

Logistic regression, Decision Tree Classifer, Random Forest, Grident Boosting, Adaboost classifier, Xgboost, bagging classifer

Section 6

Tunne Model

GradientBoosting Classifier Logistic Decision Tree

We are tunning because Logistic regression give 88 accuracy and Bagging boosting and gradient boosting is also give 94 95 accuracy and all others are giving better accuracy so that's why we choose those three

Results of all Models

Section 7&8 Hyper parameter tuning using random search and seeing Model Performances

It's clear that the performance XGBoostClassifier is better. With a recall of 92.5 % we clearly reached our goal. ( goal: ... > 0.62 )

Hyperparameter tuning

RandomizedSearchCV

First we'll use a RandomizedSearchCV to find narrow down on the most optimal parameters. For further finetuning GridSearchCV will be used.

Section 9 Productionize the model

Machine Learning Pipelines

The main purpose is to codify and automate the workflow it takes to produce a machine learning model, as you have seen we have transformed the data, add new features, remove outliers and more. Since the main idea behind data science is to experiment, as in real life a good infrastructure will ramp the number of iterations you can test.

Most of the times in the first iteration data scientists focus on producing a model to solve a single business problem and don't invest much time in building the architecture and tend to start with a manual workflow, however once they are integrated with production this type of work doesn't fit since the speed of the iteration cycle or there is a risk being a manual process.

That being said we can construct pipelines that help us to pre-process and train.

Section 10

Conclusion

We can conclude that the top 3 most influential features are the product variables:

Recomentdation & Future improvements

With the existing consumer insights through data, companies can predict customers’ possible needs and issues, define proper strategies and solutions against them, meet their expectations and retain their business. Based on the predictive analysis and modeling, businesses can focus their attention with targeted approach by segmenting and offering them customized solutions. Analyzing how and when the churn is happening in customer’s lifecycle with the services will allow the company to come up with more preemptive measures.